Skip to content

feat: cross-endpoint routing for serverless functions#129

Merged
deanq merged 19 commits intomainfrom
deanq/ae-1348-cross-endpoint-routing
Jan 9, 2026
Merged

feat: cross-endpoint routing for serverless functions#129
deanq merged 19 commits intomainfrom
deanq/ae-1348-cross-endpoint-routing

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Jan 3, 2026

Summary

Implement cross-endpoint routing to enable serverless functions to call functions deployed on different endpoints. Functions can now seamlessly execute locally or remotely based on service discovery configuration.

See AE-1348 and Cross_Endpoint_Routing.md for details

Implementation

Core Components

  • DirectoryClient - HTTP client for mothership directory service to discover endpoint URLs
  • ServiceRegistry - Service discovery layer that loads manifests and routes functions to endpoints
  • ProductionWrapper - Execution router that intercepts stub calls and routes to local or remote endpoints
  • Custom Exceptions - Structured exception hierarchy for better error handling

Key Features

  • Thread-safe async caching with proper locking
  • Robust URL parsing and validation
  • Manifest-based function routing configuration
  • Automatic serialization/deserialization of arguments with cloudpickle
  • Comprehensive error handling with custom exceptions
  • Full integration with existing stub execution layer

Code Quality

  • Centralized configuration constants
  • Complete type hints throughout
  • Thread-safe async operations
  • Comprehensive test coverage with integration tests

Changes

  • src/tetra_rp/runtime/directory_client.py - New HTTP client for mothership API
  • src/tetra_rp/runtime/service_registry.py - New service discovery layer
  • src/tetra_rp/runtime/production_wrapper.py - New execution router
  • src/tetra_rp/runtime/config.py - Centralized configuration
  • src/tetra_rp/runtime/exceptions.py - Custom exception hierarchy
  • src/tetra_rp/stubs/registry.py - Integration with stub layer
  • Tests for cross-endpoint routing functionality

Architecture

Functions are routed based on a manifest that maps function names to resource configurations. The service registry queries the mothership directory to find endpoint URLs, then the production wrapper decides whether to execute locally or create a remote HTTP call to another endpoint.

deanq added 11 commits January 3, 2026 09:49
Add DirectoryClient to query mothership endpoint directory with:
- Retry logic with exponential backoff (3 attempts)
- Configurable timeout (10s default)
- Proper error handling and logging
- Async context manager support
- Connection pooling via httpx
Add ServiceRegistry to manage manifest loading, directory queries, and
routing decisions with:
- Manifest loading from file, env var, or auto-detection
- On-demand directory loading via DirectoryClient with caching
- Cache TTL support (300s default, configurable)
- Function routing decisions (local vs remote)
- Resource and function metadata access
- Graceful degradation if directory unavailable
Add HTTP client for cross-endpoint function execution with:
- Async/sync job submission to RunPod endpoints
- Async job polling with configurable intervals and timeouts
- Cloudpickle serialization/deserialization of arguments
- Authentication via RUNPOD_API_KEY header
- Error handling and response format handling
- Connection pooling via httpx.AsyncClient
- Async context manager support
Add routing wrapper that intercepts stub execution and determines if
function calls should be executed locally or routed to remote endpoints with:
- Function routing decision based on ServiceRegistry
- Automatic directory loading before routing decisions
- Remote execution via HTTP with proper payload construction
- Class method execution support
- Error handling and logging
- Singleton factory pattern for component reuse
Add ProductionWrapper injection to stubs/registry.py to enable cross-endpoint
routing for LiveServerless and CpuLiveServerless resources.

- Check for RUNPOD_ENDPOINT_ID environment variable (production mode indicator)
- Create and inject wrapper around both stubbed_resource and execute_class_method
- Preserve original behavior when not in production
- Graceful fallback if ProductionWrapper import fails
- No changes to public API or user-facing behavior

This enables transparent cross-endpoint function routing while maintaining
full backward compatibility.
Add comprehensive integration tests covering the full routing flow:
- Local function execution (no remote call)
- Remote function execution via HTTP
- On-demand directory loading
- Error propagation from remote endpoints
- Factory creates complete integrated system

These tests validate the entire stack from ServiceRegistry → ProductionWrapper
→ CrossEndpointClient → HTTP execution, ensuring all components work together.
- Remove CrossEndpointClient HTTP client duplication (~250 lines eliminated)
- Add get_resource_for_function() to ServiceRegistry that returns ServerlessResource
- Modify ProductionWrapper to use ServerlessResource.run_sync() for remote execution
- Delete http_client.py and test_http_client.py (replaced by ServerlessResource)
- Update ProductionWrapper tests to mock ServerlessResource instead of HTTP client
- Add unit tests for get_resource_for_function() in ServiceRegistry tests
- Update integration tests to mock ServerlessResource
- Simplify ServerlessResource import (no circular dependency)
- All 405 tests pass with 65% coverage
Remove unnecessary lazy imports from inside fixture - there's no circular
dependency issue. ResourceManager and SingletonMixin don't create circular
imports when imported at module level.
…ements

Add comprehensive lessons learned from the recent refactoring session:

- Add async thread safety pattern to Async Best Practices section
- Add custom exception hierarchies to Error Handling section
- Expand anti-patterns with URL parsing and unreachable code examples
- Add mock alignment lesson to Testing Requirements
- Create new Configuration Patterns section for constant centralization

These lessons reflect improvements made to the cross-endpoint routing feature:
- Thread-safe async cache with asyncio.Lock
- Custom exception hierarchy (RuntimeError → RemoteExecutionError, SerializationError)
- Robust URL parsing with urllib.parse.urlparse
- Centralized configuration in config.py module
- Test mock alignment with actual API contracts

All examples use consistent GOOD / BAD pattern for clarity.
Add custom exception hierarchy and centralized configuration:

- Create exceptions.py with RuntimeError base and domain-specific exceptions
- Create config.py with centralized constants (timeouts, retries, cache TTL)
- Add asyncio.Lock for thread-safe directory cache in ServiceRegistry
- Improve URL parsing with urllib.parse.urlparse and validation
- Fix JobOutput API mismatch: check error field instead of success attribute
- Add serialization error handling with custom SerializationError
- Improve type hints across runtime modules
- Update tests to align with actual API contracts
@deanq deanq requested a review from Copilot January 3, 2026 22:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements cross-endpoint routing for serverless functions, enabling functions to seamlessly execute locally or remotely based on service discovery configuration. The implementation adds a service discovery layer that queries a mothership directory service to find endpoint URLs and routes function calls accordingly.

Key Changes:

  • New runtime module with HTTP client for mothership directory service
  • Service registry that loads manifests and performs function-to-endpoint routing
  • Production wrapper that intercepts stub calls and routes to local or remote endpoints

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/tetra_rp/runtime/__init__.py New runtime module initialization
src/tetra_rp/runtime/config.py Centralized configuration constants for HTTP client and caching
src/tetra_rp/runtime/exceptions.py Custom exception hierarchy for runtime errors
src/tetra_rp/runtime/directory_client.py HTTP client implementation for mothership directory API
src/tetra_rp/runtime/service_registry.py Service discovery and routing logic with manifest loading
src/tetra_rp/runtime/production_wrapper.py Execution router that wraps stub calls and handles remote execution
src/tetra_rp/stubs/registry.py Integration with existing stub layer via wrapper injection
tests/conftest.py Moved imports to top for better organization
tests/unit/runtime/test_directory_client.py Unit tests for directory client HTTP operations
tests/unit/runtime/test_service_registry.py Unit tests for service registry routing logic
tests/unit/runtime/test_production_wrapper.py Unit tests for production wrapper execution routing
tests/integration/test_cross_endpoint_routing.py Integration tests for complete routing flow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tetra_rp/runtime/exceptions.py Outdated
… builtin

The custom RuntimeError class in runtime.exceptions was shadowing Python's
built-in RuntimeError, creating ambiguity. Renamed to FlashRuntimeError as
the base exception class for all cross-endpoint runtime errors.

Derived exceptions (RemoteExecutionError, SerializationError, ManifestError,
DirectoryUnavailableError) now inherit from FlashRuntimeError.

Addresses Copilot review feedback on PR #129.
deanq and others added 4 commits January 4, 2026 22:48
Add detailed documentation for PR #129 covering:

User Guide:
- Quick start with manifest and environment setup
- Configuration guide with manifest structure explanation
- Usage patterns for microservice architecture, mixed local/remote, and fallback scenarios
- Error handling and serialization guidelines

Contributor Guide:
- Architecture overview with data flow diagrams
- Core component documentation (ProductionWrapper, ServiceRegistry, DirectoryClient, Exceptions)
- Integration points with stub layer and ResourceManager
- Design decision rationale
- Extension points for serialization, directory backends, and routing policies
- Testing strategy and debugging approaches

Documentation is verified against actual code implementation with:
- Correct manifest format (function_registry + resources structure)
- Accurate method names and signatures
- Proper exception hierarchy (FlashRuntimeError base class)
- Correct HTTP library (httpx, not aiohttp)
- Accurate configuration constants and defaults
Comment thread src/tetra_rp/runtime/production_wrapper.py Outdated
Comment thread src/tetra_rp/runtime/directory_client.py Outdated
Comment thread src/tetra_rp/runtime/service_registry.py
Comment thread src/tetra_rp/runtime/service_registry.py
deanq added 3 commits January 8, 2026 16:08
Create new src/tetra_rp/runtime/serialization.py with reusable functions
for cloudpickle + base64 encoding/decoding to eliminate duplication across
6 production files:
- serialize_arg(), serialize_args(), serialize_kwargs()
- deserialize_arg(), deserialize_args(), deserialize_kwargs()

This addresses the PR #129 comment to refactor duplicated serialization
code. All serialization now goes through a single, consistent interface
with proper error handling via SerializationError.

Updated files:
- production_wrapper.py: Use serialize_args/kwargs
- live_serverless.py: Use serialize_args/kwargs
- execute_class.py: Use serialize_args/kwargs for constructor and method args
- generic_handler.py: Use deserialize/serialize utilities
- lb_handler.py: Use deserialize/serialize for /execute endpoint
- load_balancer_sls.py: Use serialize/deserialize for HTTP-based stub

All 581 tests passing. Code coverage: 65.37%.
Rename DirectoryClient to ManifestClient to better reflect its purpose as
the manifest directory service (endpoint registry) rather than a generic
directory. This addresses PR #129 comment regarding naming clarity.

Changes:
- Rename src/tetra_rp/runtime/directory_client.py to manifest_client.py
- Rename class DirectoryClient -> ManifestClient
- Rename exception DirectoryUnavailableError -> ManifestServiceUnavailableError
- Update all imports and references in:
  - service_registry.py
  - exceptions.py
  - All test files (test_manifest_client.py, test_service_registry.py, test_cross_endpoint_routing.py)

The manifest directory service fetches an endpoint registry that maps
resource_config names to their deployment URLs from the mothership API.

All 581 tests passing. Code coverage: 65.37%.
Create new src/tetra_rp/runtime/models.py with Pydantic-inspired dataclasses:
- FunctionMetadata: Function definition with name, module, async status, HTTP routing
- ResourceConfig: Resource configuration with type, handler, and functions
- Manifest: Top-level manifest with version, project name, function registry, resources

This addresses the PR #129 comment to improve manifest type safety and IDE support.

Changes:
- ServiceRegistry now loads manifests into Manifest objects
- Maintains backward compatibility with dict-based manifests in handler generators
- Updated get_all_resources() and get_resource_functions() to convert to dicts
- Updated HandlerGenerator and LBHandlerGenerator to work with both dict and Manifest
- Updated test fixtures to use attribute access instead of dict access

Manifest.to_dict() allows serialization to JSON, and Manifest.from_dict()
allows deserialization from JSON.

All 581 tests passing. Code coverage: 65.68%.
@deanq deanq merged commit 57ff437 into main Jan 9, 2026
7 checks passed
@deanq deanq deleted the deanq/ae-1348-cross-endpoint-routing branch January 9, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants